Multi-stream recognition of noisy speech with performance monitoring

نویسندگان

Ehsan Variani

Feipeng Li

Hynek Hermansky

چکیده

A prototype multi-stream system with a performance monitor for stream selection is proposed to recognize speech in unknown noise. The speech signal is decomposed into seven band-limited streams. Posterior probabilities of phonemes are estimated by a multi-layer perceptron (MLP) in each of these band-limited streams. Estimated posterior vectors of all 127 combinations (processing streams) of the seven band-limited streams form inputs to a second-stage MLP that estimates posterior probabilities of phonemes in each processing stream. A performance monitor is designed to predict the reliability of individual processing streams based on the outputs from these streams. The top N streams that are least affected by noise are selected and their outputs are averaged to yield the final posterior probability vector used in Viterbi search for the best phoneme sequence. Experimental results show that the proposed technique is effective in dealing with noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Auditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments

In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition ...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-SNR car environments

This paper presents an evaluation of the use of some auditorybased distinctive features and formant cues for robust automatic speech recognition (ASR) in the presence of highly interfering car noise. Comparative experiments have indicated that combining the classical MFCCs with some auditory-based acoustic distinctive cues and either the main formant magnitudes or the formant frequencies of a s...

متن کامل

DBN based multi-stream models for speech

We propose dynamic Bayesian network (DBN) based synchronous and asynchronous multi-stream models for noise-robust automatic speech recognition. In these models, multiple noise-robust features are combined into a single DBN to obtain better performance than any single feature system alone. Results on the Aurora 2.0 noisy speech task show significant improvements of our synchronous model over bot...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Multi-stream recognition of noisy speech with performance monitoring

نویسندگان

چکیده

منابع مشابه

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Auditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments

Improving the performance of MFCC for Persian robust speech recognition

Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-SNR car environments

DBN based multi-stream models for speech

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

عنوان ژورنال:

اشتراک گذاری